Modeling difference x-ray scattering observations from an integral membrane protein within a detergent micelle

Time-resolved x-ray solution scattering (TR-XSS) is a sub-field of structural biology, which observes secondary structural changes in proteins as they evolve along their functional pathways. While the number of distinct conformational states and their rise and decay can be extracted directly from TR-XSS experimental data recorded from light-sensitive systems, structural modeling is more challenging. This step often builds from complementary structural information, including secondary structural changes extracted from crystallographic studies or molecular dynamics simulations. When working with integral membrane proteins, another challenge arises because x-ray scattering from the protein and the surrounding detergent micelle interfere and these effects should be considered during structural modeling. Here, we utilize molecular dynamics simulations to explicitly incorporate the x-ray scattering cross term between a membrane protein and its surrounding detergent micelle when modeling TR-XSS data from photoactivated samples of detergent solubilized bacteriorhodopsin. This analysis provides theoretical foundations in support of our earlier approach to structural modeling that did not explicitly incorporate this cross term and improves agreement between experimental data and theoretical predictions at lower x-ray scattering angles.


INTRODUCTION
An accurate measurement of protein conformational changes as they evolve with time has long been a goal of structural biology. 1 Serial crystallography approaches, first implemented at x-ray free electron lasers 2 (XFELs), have revolutionized the field of time-resolved crystallography since numerous technical challenges that limited the sphere of application of time-resolved Laue diffraction were overcome. [3][4][5] The critical benefit of time-resolved crystallography is that a successful study delivers high-resolution structural insight into the chemical details of an evolving reaction from sub-picoseconds to minutes. Recent developments in single-particle cryo-electron microscopy can also provide structural insight into the progress of a biological reaction at high-resolution, 6,7 and real-time NMR studies of protein dynamics have yielded insights into protein unfolding and refolding. 8 As methods for characterizing protein conformational changes become increasingly mainstream, a full appreciation of the relationship between structure and function must increasingly incorporate structural dynamics.
Although time-resolved crystallography is undergoing a period of rapid growth, 4 a fundamental limitation of time-resolved crystallographic techniques is that samples must both crystallize and remain in a highly ordered arrangement as the reaction evolves. It is not always possible to recover well diffracting crystals for any chosen system of study and a reaction of interest may be slowed in the crystalline state. 4,9,10 Moreover, crystallographic methods cannot visualize protein conformational changes that are incompatible with the crystal packing lattice since they are either suppressed or they disrupt the crystalline order. Time-resolved x-ray solution scattering (TR-XSS) studies of protein structural changes have been developed to address these limitations. [11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28][29] The key advantage of TR-XSS is that it provides a relatively direct measurement of the timescale and number of species with distinct secondary structural characteristics. Conversely, since the sample is a randomly oriented ensemble of protein molecules in solution, the measured information is only one-dimensional. As such, TR-XSS is not as information rich as time-resolved x-ray diffraction and structural modeling relies upon low-resolution shape reconstructions 22,26,27 or leans heavily upon additional information that may be extracted from crystallographic structures 11,12,15,18,25 or molecular dynamics trajectories. [13][14][15]21,23,24,[27][28][29] Structural interpretation of TR-XSS data recorded from integral membrane proteins 12,18,25,28,29 is further complicated by the fact that a detergent micelle, which surrounds the integral membrane protein, 30 frequently contains more atoms than the protein itself and is highly disordered. Consequently, difference x-ray scattering data extracted from TR-XSS experiments rely upon structural fluctuations within the detergent micelle averaging out due to the very large number of molecules, which are exposed to an x-ray beam in any experiment. For example, at a concentration of 0.5 mM, approximately 10 12 protein molecules are found in a 3.6 nl sample volume, which is the volume calculated from an x-ray beam 60 Â 120 lm 2 passing through a 500 lm 2 capillary. 12 It is, however, not computationally feasible to perform simulations of 10 12 membrane-protein/micelle systems, and therefore, approximations are required. In our earlier modelling, 12,18,24,25 we argued that the influence of the membrane protein-micelle cross term on the x-ray scattering differences do not compromise the structural conclusions drawn from x-ray scattering difference data in the domain 0.2 Å À1 q 1.0 Å À1 . Here, we examine these assumptions more critically by fitting TR-XSS data from bacteriorhodopsin (bR) 12 against structures extracted by combining structural results from time-resolved x-ray crystallography 31,32 and molecular dynamics simulations that explicitly include a detergent micelle. This analysis highlights the benefits and shortcomings of our earlier approach and provides theoretical foundations for extending the structural analysis of TR-XSS data to a larger set of integral membrane proteins.

THEORY
For any sample consisting of j atoms, the x-ray scattering from the sample is given by summing the x-ray scattering contributions from each of the atoms of the sample with a phase-factor associated with the spatial location of each atom relative to an arbitrary origin, 33 F q ð Þ ¼ X j f j exp iq Á r j ð Þ; (1) where q is a 3D vector in reciprocal space corresponding to the change in momentum of the scattered x-ray and has a magnitude q ¼ 4p sin h=k; k is the x-ray wavelength; h is half the angle through which the x-ray beam is deflected; F q ð Þ is a complex number called a structure factor that is specified for all values of q; f j is the atomic scattering factor of atom j of the sample; and r j is the 3D coordinate in real space of atom j. The intensity of the scattered x-ray beam for any vector q is proportional to the magnitude squared of the structure factor, If samples are assumed to be monodisperse in solution, it is possible to average over all random orientations of molecules to recover, where hÁ Á Ái denotes rotational averaging in space and yields an isotropic x-ray scattering distribution (i.e., depends only upon the magnitude of q). When summing over all N molecule in the sample, we have made the assumption that the interference between separate molecules may be neglected. 33 Several packages have been written which build upon this formula to analyze x-ray scattering data. In this work, we utilize the package CRYSOL 34 for x-ray scattering calculations. CRYSOL evaluates Eq. (3) using the atomic coordinates of macromolecules as input and a utilizes a multipole expansion to calculate the spherically averaged scattering pattern. When working with integral membrane proteins surrounded by a detergent micelle, we can divide the protein and micelle contributions of Eq. (1) to become for which F p q ð Þ represents the structure factor calculated from the protein alone; F m q ð Þ represents the structure factor calculated from the detergent micelle alone; and F pm q ð Þ is the structure factor calculated from both combined. Substituting Eq. (4) into Eq. (3) and performing rotational averaging over samples in solution, and noting that hjF q ð Þ j 2 i is a real number, gives where this expansion is used to define a new variable, U pm ðqÞ, which we introduce to in order to separate the protein and detergent micelle scattering terms within this formalism. As such, cosU pm ðqÞ can be evaluated by inverting Eq. (5) and utilizing Eq. (3) to recover, where S p q ð Þ is the isotropic solution scattering intensity calculated from the protein alone, S m q ð Þ is the isotropic solution scattering intensity calculated from the detergent micelle alone, and S pm q ð Þ is the isotropic solution scattering intensity calculated from both protein and micelle combined. In practice, we calculate cos U pm q ð Þ by writing separate pdb files containing all of the atoms of the system (Fig. 1), another containing the atoms of the protein alone, and another containing the atoms of the detergent micelle alone. We then calculate the x-ray solution scattering intensities from the corresponding pdb files using CRYSOL. A representative example of cos U pm q ð Þ calculated from bacteriorhodopsin in the detergent b-octyl glucoside [ Fig. 1(a)] is given in Fig. 1(b). It can be seen that cos U pm q ð Þ varies smoothly with q between the values 1 and À1; cos U pm 0 ð Þ is unity due to the fact that all scattering in the forward direction is in phase; and at higher scattering angles, cos U pm q ð Þ ! 0 as the x-ray scattering from the atoms of the protein and micelle lose phase coherence.
In TR-XSS studies, the most important experimental observation is the change in x-ray scattering as a function of x-ray scattering angle and time, DS expt ðq; DtÞ. When modeling protein conformational changes, we are, therefore, concerned with how changes in the structure of the integral membrane protein lead to changes in the measured x-ray scattering amplitudes. From Eq. (5), and using the notation S p q ð Þ in favor NhjF p q ð Þ j 2 i, etc…, we find that if structure of the membrane protein or detergent micelle changes then corresponding change in x-ray scattering intensities will be where DS p q ð Þ is the difference between the x-ray scattering predictions generated from one protein structural model (the candidate structure) and the initial structural model (the resting conformation) and DS m q ð Þ is the difference between the x-ray scattering predictions generated from one micelle structural model and another. Likewise, D cos U pm q ð Þ Â is calculated as the change in cos U pm ðqÞ predicted by Eq. (6) when comparing two different molecular conformations of the entire system. We also used the chain rule to make the approximation D  , which indicates the relative phase coherence between the x-ray scattering from the protein and detergent micelle: cos U pm q ð Þ ¼ 1 means the x-ray scattering is in phase; cos U pm ðqÞ ¼ À1 is out of phase; and cos U pm ðqÞ ! 0 indicates little phase coherence. (c) Movements of Ca-atoms of helix C from Pro77 to Pro 91 extracted from pdb entries 5B6Z minus 5B6V; and movements of Ca-atoms for helices E and F from Leu152 to Pro186 extracted from pdb entries 6RPH minus 6RQP, with adjustments in the loop region. These coordinates were then used as targets for molecular dynamics simulations using GROMACS according to Eq. (12). (d) A snapshot from a GROMACS simulation of the resting conformation of bR within a detergent micelle. (e) A snapshot from a GROMACS simulation of the light-activated conformation of bR within a detergent micelle. (f) Creation of a pdb file with a snapshot of the protein from the resting conformation trajectory placed within a snapshot of the micelle from the light-activated conformation trajectory. (g) Creation of a pdb file with a snapshot of the protein from the light-activated trajectory placed within a snapshot of the micelle from the resting conformation trajectory. is proportional to the x-ray scattering changes associated with structural changes within the detergent micelle and is also modulated due to the scattering cross term between the protein and membrane. The final term, 2 ffiffiffiffiffiffiffiffiffiffi ffi S p q ð Þ p ffiffiffiffiffiffiffiffiffiffiffi ffi S m q ð Þ p D½cos U pm q ð Þ , reflects how changes in the positions of the atoms of the protein and the micelle influence the effective phase coherence of x-ray scattering cross term between the protein and micelle. In our previous analyses 12,18,24,25 we effectively assumed that both DS m ðqÞ and D½cosU pm q ð Þ were vanishingly small outside of the small-angle scattering domain and, therefore, only the first term of Eq. (7) needed to be considered during structural fitting. In this work we use the analysis of multiple conformations sampled in molecular dynamics trajectories of the protein within a detergent micelle to assess the validity of these assumptions and, consequently, to extend the modeling formalism as appropriate.
When comparing theoretical predictions with experimental data, it is also essential to determine the influence of the solvent volume displaced by both the protein and micelle, which is usually referred to as the solvent-excluded volume. 33 If F solv q ð Þ denotes the x-ray scattering from the solvent-excluded volume due to the presence of the protein and detergent micelle then we can write the total x-ray scattering after correcting for the solvent-excluded volume, F tot q ð Þ , as where the negative sign preceding F solv q ð Þ arises from the fact that the x-ray scattering from the solvent-excluded volume must be subtracted. Limiting ourselves to consider only isotropic solution x-ray solution scattering from randomly oriented molecules [Eq. (3)] and applying the same logic as above, allows us to recover, which, in turn, defines When predicting x-ray scattering changes as an integral membrane protein changes conformation, we cannot reliably predict the changes in structure of the solvent-excluded volume. We instead assume that both the resting and altered conformations are equally modulated by solvent-exclusion effects and Eq. (9) can then be used to calculate the ratio S tot ðqÞ=S pm ðqÞ for all values of q. With this rationale, we make the approximation that the theoretical predictions, DS pm q ð Þ , calculated from Eq. (7) should be scaled according to in order to compare theoretical predictions against experimental difference x-ray scattering data.

TR-XSS data collection
Detergent solubilized samples of bR were prepared as previously described. 12 TR-XSS data ( Fig. 2) were recorded at the beamline ID09 of the ESRF using polychromatic x-rays following photo-excitation with a green (532 nm) laser pulse 4 ns in duration. TR-XSS data were recorded for the time delays Dt ¼ 5.25, 13.8, 36.2, 95.2, 250, 657 ls, 1.725, 4.54 12.0, and 31.4 ms between the arrival of the 532 nm photoactivation laser pulse and the x-ray probe. Results from these measurements were compared with earlier TR-XSS measurements from bR 12 also recorded using a 532 nm laser pulse for photoactivation but 100 ns in duration, and for the time-delays Dt ¼ 360, 800 ns, 2, 20, 65, 200, 650 ls, 2, 20, and 100 ms. In our previous TR-XSS studies, we used a physical translation of the goniometer to translate samples between each laser flash, 12 whereas in these studies, we flowed the sample through a capillary and thereby replaced the sample between each and every laser flash. In this work, we further recorded XSS data during the photo-excitation of bR samples using continuous illumination with an infrared (IR) laser with a wavelength of 1470 nm, which allowed samples of solubilized bR to be heated without initiating the bR photocycle. In this manner, we recorded directly the influence of sample heating on different XSS spectra [ Fig. 2

(a)].
In our earlier analysis, 12 we assumed that the time-delay Dt ¼ 100 ms represented the difference spectrum, DS(q), arising purely from the effects of heating. This assumption, however, was an approximation since a superposition of the difference scattering data from 2009 for the 100 ms time-point, DS 2009 (q, Dt ¼ 100 ms), with the difference scattering recorded here using IR excitation, DS 2017 (q, IR), shows a significant discrepancy for q < 0.3 Å À1 [ Fig. 2 . Given the limited time-sampling of these data, this value is close to the timescale for the crossing of two populations (Dt % 13 ls) in a time-resolved serial femtosecond crystallography (TR-SFX) study using microcrystals of bR 31 and which correlated well with the L-to-M spectral transition. Unexpectedly, the amplitude of the difference XSS signal after the removal of heat from the 2017 data was only 40% of the amplitude achieved in the 2009 data. As such, linear decomposition of the TR-XSS difference data from 2017 failed to produce a convincing difference signal for the first component, but the difference XSS signal for the second component was in close agreement (Pearson correlation coefficient of 97%) with that recovered for component 2 from the 2009 data [ Fig. 2(b)]. This reproducibility demonstrates that different sample preparations, which show variations in detergent composition as well as other experimental parameters, do not unduly influence the experimental difference XSS signal.
The choice of an appropriate pump laser fluence in time-resolved diffraction and x-ray scattering studies has received considerable attention. 4,35,36 In these studies, the pump laser fluence and protein concentration differed between the 2009 and 2017 data collection runs. The earlier studies passed 250 lJ/pulse through a spot focused to a FHWM (full width half maximum) of 265 Â 265 lm 2 , amounting to an average pump laser fluence across the FWHM of 227 mJ/cm 2 . Using the procedure described in Ref. 24 (f) Guinier plot, ln½S q ð Þ vs: q 2 , of the total x-ray scattering data at two protein concentrations (35 mM, blue; 25 mM, red). The departure from linearity at low-q suggests some degree of protein aggregation, although this interpretation may be complicated by the presence of a lipid environment. 30,50 Structural Dynamics ARTICLE scitation.org/journal/sdy the probed sample volume to be much lower than at the surface of the capillary. It is, however, difficult to explain why the amplitude of the difference signal extracted from the 2017 study [ Fig. 2(b)] is only 34% of the difference signal observed in 2009, 12 given that the protein concentration in the latter experiment was more than twice that of the earlier study, and therefore, the occupancy of the excited state population recorded in 2017 was only 15% of that achieved in the 2009 data. It is possible that the difference photo-kinetics of the photo-stationary phase (data were collected in 2009 using a 100 ns pump-laser; 12 data were collected in 2017 using a 5 ns pump-laser) may underpin this observation. A Guinier plot calculated for the absolute x-ray scattering at two protein concentrations [ Fig. 2(f)] also shows a departure from linearity, which may indicate sample aggregation. It may, therefore, be counterproductive to push the protein concentration to the point where the optical density of the sample across the flow-cell becomes too large. We, thus, recommend that, in addition to optimizing the spatial overlap between the laser focus and the x-ray beam, time should also be taken to optimize other parameters such as the protein concentration, the flow-cell thickness, the pump laser fluence, and (whenever possible) the pumplaser duration before proceeding to collect TR-XSS data.

Influence of the detergent micelle on difference x-ray scattering
When modeling difference x-ray scattering basis spectra [ Fig. 2 , it is necessary to predict x-ray scattering changes for a set of candidate secondary structural conformational changes. Disagreement between these predictions and the experimental difference spectra is then minimized by sampling various amplitudes and combinations of the chosen motions. From the theoretical developments above, Eq. (7) highlights in addition to the influence of protein conformational changes on the x-ray scattering, DS p ðqÞ, it is important to also model the influence of changes within the micelle on the difference x-ray scattering, DS m ðqÞ, and changes in the phase relationships between the x-ray scattering from the protein and from the detergent micelle, D cos U pm q ð Þ Â . We approach this problem by performing molecular dynamics simulations of bacteriorhodopsin placed within a detergent micelle of b-octyl glucoside [ Fig. 1(a)] using the package GROMACS. 37,38 Candidate motions of a-helices were extracted from published time-resolved serial crystallography studies of bacteriorhodopsin, 31,32 and energy restraints of 1000 kJ/nm 2 were used to drive the protein conformational states of bacteriorhodopsin toward the Ca backbone coordinates describing these candidate motions. Specifically, a vector describing Ca movements within helix C ðDCa C Þ was extracted from pdb entry 5B6Z minus 5B6V using Ca coordinates from Pro77 to Pro91; a vector describing Ca movements within helices E and F ðDCa EF Þ was extracted from pdb entry 6RPH minus 6RQP using Ca coordinates from Leu152 to Pro186; and a perturbation of Ca atoms from Leu211 to Phe219 of helix G associated with retinal isomerization was also incorporated ðDCa G Þ. Moreover, electron density for the EF loop for entry 6RPH was inspected, and for those residues where the electron density was not well defined (residues 157, 158, and 161-167), the DCa EF displacements were determined from a weighted average of the coordinate displacements of the adjacent residues. The amplitudes of helices E and F and the helix C motions were then scaled by the variables c and d; respectively, and the Ca atoms of bacteriorhodopsin were driven toward the coordinates, using energy restraints of 1000 kJ/nm 2 during molecular dynamics simulations [ Fig. 1(c)] of 20 ns (preparative analysis) or 5 ns (fitting runs) in duration using a time step of 2 fs, with pdb files written out for x-ray scattering calculations every 10 ps. Each of these simulations contained 3565 atoms within the protein of which 1816 atoms are hydrogen atoms and 1749 are non-hydrogen atoms; and the micelle contained 190 molecules of b-octyl glucoside with 9120 atoms of which 5320 are hydrogen atoms and 3800 are non-hydrogen atoms. Because the retinal is isomerized for all activated state trajectories, we did not scale DCa G by a variable parameter but rather we kept the amplitude of this motion equal to the change observed using timeresolved x-ray crystallography. Although this assumption could be challenged, the inclusion of too many free parameters during refinement against the 1D difference x-ray scattering data also risks overfitting.
Since the x-ray scattering contribution is proportional to h F p q ð Þ 2 i, the scattering contribution of the detergent micelle is approximately five times larger than that of the protein. Moreover, of the 231 residues within the bR model, only 59 residues (15 in helix C þ 35 in helices EF þ 9 in helix G) had their backbone Ca atoms displaced (i.e., 26% of the protein) according to Eq. (12). As such, we are aiming to predict the impact on the x-ray scattering of Ca movements associated with approximately 26% of the protein, or 8% of the protein plus micelle system. Moreover, whereas molecular dynamics simulations restrain the Ca atoms of the protein, there were no such restraints associated with the atoms of the detergent micelle. As such, to utilize the predictions of Eq. (7), it was essential to develop a strategy for which x-ray scattering changes associated with random fluctuations of the detergent micelle did not completely dominate the predicted changes.
X-ray scattering changes (without solvent correction) arising from CRYSOL calculations from the protein in isolation, DS p ðqÞ of Eq. (7), are shown in Fig. 3(a) as c is varied stepwise from 0 to 1.5 in ten steps of 1 6 and d is varied from 0 to 4 3 in five steps of 1 3 . Despite fluctuations associated with individual conformations along these Carestrained molecular dynamics trajectories, when the principal SVD components of the difference x-ray scattering intensities are extracted from the set of 2001 difference x-ray scattering curves (calculated pairwise, by sampling a perturbed and resting conformation every 100 fs of their respective trajectories), there is a smoothly varying change in the predicted difference x-ray scattering as c and d are varied. This demonstrates that the structural bias imposed by Eq. (12) during Carestrained molecular dynamics simulations is sufficiently powerful to allow quantitative estimates of c and d values when fitting against the experimental difference x-ray scattering spectra. As such the use of energy restraints to drive Ca atoms toward a given target structure extends our earlier modeling of these difference x-ray scattering data using rigid motions of a-helices. 12 Similarly, conformational ensembles extracted from molecular dynamics simulations have been used to fit TR-XSS data from visual rhodopsin 25 and several other approaches using molecular dynamics have also been reported. [13][14][15]21,23,24,[27][28][29] In contrast with the x-ray scattering predictions deriving from the protein in isolation, difference x-ray scattering predictions extracted from the same set of molecular dynamics trajectories but focusing upon changes in x-ray scattering resulting from the micelle Structural Dynamics . This issue arises because there are no restraints on the motion of detergent molecules within the micelle and therefore every trajectory may depart significantly from the starting micelle structure. Moreover, when longer simulations of 100 ns were explored, there was no evidence that the system settled to a steady-state equilibrium. It is thus too computationally expensive to perform sufficiently long simulations in order to average out these fluctuations. Nevertheless, although the difference x-ray scattering curves describing the influence of the term DS m ðqÞ in Eq. (7) do not predict smoothly evolving changes that correlate with c and d, it is possible to use SVD analysis of the fluctuating DS m ðqÞ curves and thereby extract the principal SVD component, which may be incorporated as a low-angle correction to the predicted difference x-ray scattering.
In addition to structural changes within the protein and within the detergent micelle, we must also consider changes in the x-ray scattering cross term between the protein and the detergent micelle as the protein changes structure, summarized in Eq. (7) as the term involving D½cos U pm q ð Þ This cross term is influenced by structural changes in both the protein and the micelle as c and d are varied [ Fig. 1(c)]. To side-step the issue of structural fluctuations within the detergent micelle [Figs. 3(b)-3(e)], we developed the following protocol to extract the influence of D½cos U pm q ð Þ on DS pm ðqÞ arising from conformational changes within the protein alone. As illustrated in Fig. 1, we first generate coordinate files extracted from molecular dynamics trajectories of the protein in its resting conformation (p rest Þ, within a detergent micelle (m rest Þ [ Fig. 1(d)], and generate x-ray scattering predictions, DS prestmrest ðqÞ using CRYSOL. 34 Similarly, we also extract the coordinates from molecular dynamics trajectories of the protein in a modified conformation (p exc Þ, within another detergent micelle ðm exc Þ, and again use CRYSOL to predict DS pexcmexc ðqÞ for every value of c and d [ Fig. 1(e)]. To ensure that the effects of structural fluctuations within the detergent micelle cancel, we next exchanged the coordinates of the detergent micelle to create a set of complementary pdb files that were written with the sampled resting conformations of the protein, p rest , but placed within the sampled micelle conformations of the excited trajectory, m exc , [ Fig. 1(f)]; and conversely for the sampled excited conformation of the protein, p exc , placed within the sampled micelle conformations of the resting trajectory, m rest [ Fig. 1(g)]. From these complementary sets of pdb files, S prestmexc q ð Þ and S pexcmrest q ð Þ were calculated and the expression, was utilized to ensure that the term proportional to DS m ðqÞ of Eq. (7) is zero. Figure 3(f) shows the results of these calculations, revealing a continuous evolution of DS pm ðqÞ as c and d are varied, and is similar to the smooth variation in x-ray scattering seen for the protein alone [ Fig. 3(a)]. In this manner we established a protocol that allows the cross term between the changing protein structure and a detergent micelle to be evaluated from molecular dynamics trajectories, while avoiding the central problem that these predictions become completely dominated by random fluctuations within the detergent micelle. This procedure is therefore utilized for recovering theoretical fits (i.e., best estimates of c and d their respective uncertainties) against the Influence of different sized detergent micelles on difference x-ray scattering Equation (13) provides a working protocol for predicting x-ray scattering changes from a membrane protein within a detergent micelle without these predictions being dominated by random structural fluctuations of detergent molecules. In fitting the difference x-ray scattering data, we used a detergent micelle consisting of 190 molecules of b-octyl glucoside. This number of detergent molecules was chosen since it formed a stable micelle which completely surrounded a single molecule of bacteriorhodopsin without excess molecules breaking off during simulations. This choice, however, is somewhat arbitrary and there will always be a distribution of micelle sizes within any experimental sample. To account for the influence of micelle-size variations, predictions from Eq. (13) were repeated for a selected movement of helices E and F [c ¼ 1; d ¼ 0; Eq: ð12Þ but with the size of the detergent micelle varying from 170 to 230 b-octyl glucoside molecules, increasing in steps of five detergent molecules additional for each trajectory. Figure 4(a) shows that changes in the detergent micelle size cause variation in the predicted DS pm ðqÞ, but nevertheless the major features of the difference x-ray scattering are consistent for all micelle sizes. Singular value decomposition using Matlab extracted the mean [principle component, Fig. 4

(b), blue line] and variation about
the mean [second SVD component, Fig. 4(b), mustard line, scaled 100 fold for visibility] from the x-ray scattering predictions shown in Fig. 4(a). What is striking is that the major variation about the mean in DS pm ðqÞ reflected by the second SVD component is highly correlated [ Fig. 4(b), red line; Pearson correlation coefficient for 96% for q ! 0:1Å À1 ] with the principal SVD component of micelle-only fluctuations during the last 5 ns of a 20 ns long trajectory [ Fig. 3(e)]. Thus, for all practical purposes, if one wishes to apply corrections to the xray scattering due to micelle fluctuations [the term of Eq. (7) involving DS m ðqÞ, x-ray scattering data represented in Fig. 3(e)] or due to fluctuations associated with changes in the size of the detergent-micelle (extracted as the second SVD component when using Eq. (13) applied to a sequence of trajectories with varying micelle size, Fig. 4), the resulting correction curves are almost indistinguishable for q ! 0:1Å À1 . This observation is presumably due to x-ray scattering changes reflecting the intrinsic length-scale of the detergent micelle in both cases. The inclusion of both terms during minimization would therefore risk a nonphysical situation of both terms having large amplitudes but opposite signs and therefore almost canceling each other. For this reason, we used only one these two corrections during structural fitting [that extracted as the micelle size was varied, mustard line Fig. 4(b)], with the consequence that it is not possible distinguish corrections due to the micelle size fluctuations or those due to micelle structural fluctuations.

Correction for the solvent excluded volume
Predictions for light-induced changes in the x-ray scattering from the protein and detergent micelle must be further corrected due to the influence of the solvent excluded volume [Eq. (11)]. CRYSOL predicts the total x-ray scattering after correcting for the solvent excluded volume by incorporating empirical corrections to the x-ray scattering function for individual atoms 34 and these corrections were developed against a wealth of experimental data from soluble proteins. A detergent micelle typically has a much lower electron density 39 than proteins 40 and it was unclear if the empirical solvent excluded volume correction developed for soluble proteins should also be applied to integral membrane proteins within a detergent micelle. We therefore again utilized molecular dynamics simulations in GROMACS to extract the solvent excluded volume correction for the protein plus micelle system. Specifically, in addition to simulations of the protein and micelle within a box filled with water, we ran parallel unrestrained simulations of a box with water alone. We then superimposed the protein plus micelle coordinates upon the simulation with water alone, selected only those water molecules that were within a sphere of 2 Å radius from an atom of the protein or micelle, and wrote out a pdb file of water molecules alone that fully encompassed the protein and detergent [illustrated in Fig. 5(a)]. From Eq. (9), the total scattering after the excluded volume was calculated whereby the term S solv ðqÞ was extracted from pdb files of only water molecules selected above; and cos U solv was calculated according to Eq. (10). When cutting out spheres of water molecules on all atoms in the protein-micelle complex, there will be an over-counting due to some water atoms which protrude beyond the protein's surface. We, therefore, compensated for over counting the selected water molecules by varying the sphere radius between 1 and 4 Å , and counting the number of water molecules at each radii. From this result we created a linear regression to  Fig. 5(c)] and consequently structural fitting (which treats the overall scaling as a parameter to be optimized) is not very sensitive to those details. Below q ¼ 0:15Å À1 , the GROMACS approach predicts a smaller contrast between the protein and micelle and, as such, the approach developed here provides a low-angle correction when considering a protein-micelle complex.

Convolution with the undulator spectrum
Since polychromatic radiation was used to probe light-induced structural changes in bR, it was essential to account for the effect of the undulator spectrum on the x-ray scattering predictions. This effect was incorporated into the analysis by interpolating all x-ray scattering predictions onto a variable q-axis which varied inversely with the xray energy, and summing these interpolated x-ray scattering predictions according to the amplitude of the undulator spectrum for any given x-ray energy.

Structural fitting to the experimental data
Having developed the above tools and approximations, we searched for a best fit to the first and second experimental basis spectra [ Fig. 2(b)] by varying the Ca perturbations according to the parameters c and d of Eq. (12). As in our earlier approaches, 12,18,24,25 structural refinement minimized the function, for each of the two basis spectra [states 1 and 2, Fig. 1(b)]; wðqÞ is a weighting factor introduced to emphasize regions where the structural content of the difference x-ray scattering data are strongest; and DS expt q ð Þ is the experimental data for each of the two states. The term DS theory q ð Þ is extracted for each value of c and d [Eq. (10)] using the following expression: where the ratio S tot ðqÞ=S pm ðqÞ is used to correct for the influence of the solvent excluded volume  12 and reflects a loss of scattering power at higher angle arising from fluctuations about the mean atomic positions of the atoms of the protein; erf is the error function and the parameter q pm was varied during structural refinement and Dq pm was set as 25% of q pm , and this parameter quantifies the x-ray scattering angle above which the change in x-ray scattering from the micelle may become negligible due to the highly disordered nature of the detergent micelle; and DS mc ðqÞ is a micelle correction term introduced to account for either low-angle corrections due to micelle fluctuations [ Fig. 3(e)] or first order-corrections to the protein-micelle scattering term due to variations in the number of detergent molecules per micelle [ Fig. 4(b), used here throughout structural fitting] and the amplitude, A 2 , was also varied during refinement. For any given 5 ns restrained molecular dynamics trajectory, 100 structural pairs (of 501) were selected with the highest Pearson-correlation scores calculated relative to the experimental data over the domain q ! 0:20Å À1 , and Ca movements associated where CRYSOL is used to calculate S pm q ð Þ from the protein plus micelle, and S solv q ð Þ is calculated from water molecules selected in (a) (blue line is scaled by a factor of 2.1 to best map to the red line). (b) Plot of the number of water molecules selected (blue circles) when using sphere radii from 1.0 to 4.0 Å , whereas 4331 water molecules were selected when using a 2 Å radius, the relative scaling between S solv q ð Þ and S pm q ð Þ in Eqs. (10) and (11) was determined by projecting this plot back to an infinitesimal radius (red line, 0 Å corresponds to 3215 water molecules).

ARTICLE
scitation.org/journal/sdy with these 100 pairs were used in quantifying the most probable conformational changes. Overall, six parameters were varied during refinement (c; d; A 1 ; A 2 ; B P ; q pm Þ, with A 1 optimized for both states simultaneously and, as noted below, it was not necessary to optimize B p . Results of fitting to the experimental data using this procedure are illustrated in Figs. 6(a) and 6(b) as a contour plot showing the changes in R-factor [Eq. (12), calculated with the weighting function w q ð Þ set to unity for all q] as c and d are varied for states 1 and 2, respectively. The minima in these contour plots are identified for state 1 at c ¼ 0:5 6 0.2 and d ¼ 3 6 2:5, where the uncertainty is estimated as the change needed to increase the R-factor by 25% of Rminimum on the contour plot [Figs. 6(c) and 6(d)]. For state 2, the corresponding values were c ¼ 1.0 6 0:2 and d ¼ 3 6 1:5. Whereas the parameter d describing the movement of helix C seems large and very poorly defined, from the selected sub-set of optimal structures we find that the mean Ca displacements on helix C were only 0.4 6 0.3 Å and 0.7 6 0.3 Å for state 1 and state 2, respectively, when calculated from residues 81 to 89 of helix C. By comparison, the crystallographic structures 5B6Z vs 5B6V gives a mean Ca displacement of 0.46 Å for these residues of helix C. Similarly, the mean Ca displacement from residue 152 of helix E to residue 182 of helix F were 1.6 6 0.6 Å and 3.0 6 0.6 Å (Fig. 7) when calculated for the sub-set of optimal structures for state 1 and state 2, respectively. For comparison, pdb entry 6RPH vs 6RQP gives a mean Ca displacement of 3.4 Å (after adjustments for disordered regions described above) for the selected residues of helices E and F.
The micelle damping parameter q pm was refined as 0.129 Å À1 as this value led to the lowest R-factor [Eq. (14)], and this value which can be equated with a B-factor of 8000 Å 2 . Using B ¼ 8p 2 hu 2 i, where u is the amplitude of mean atomic displacement about the median position, this estimates random atomic motions within the membrane as the order of 10 Å , which is compatible with the motions of detergent micelles during the course of a simulated trajectory. By contrast the best fit was recovered when B p , the damping parameter for the protein alone, was made arbitrary small and it was not optimized further. We,  Fig. 1(b), black line]. All fits in panels (a) and (b) apply the same (optimal) scaling ratio, A 1 , defined in Eq. (14). (c) Best-fit x-ray scattering prediction (red line) superimposed upon the experimental data (black circles) for state 1. The blue line resulted when the fitting procedure was repeated without micelle and cross term corrections. (d) Best-fit x-ray scattering prediction (red line) superimposed upon the experimental data (black circles) for state 2. Experimental data were originally presented in Ref. 12. The blue line resulted when the fitting procedure was repeated without micelle and cross term corrections.

ARTICLE
scitation.org/journal/sdy therefore, conclude that the structural fluctuations within the protein were quite well described by structural variations sampled during the Ca-restrained molecular dynamics simulations and the fitting routine did not benefit from the parameter B p , which was otherwise essential when fitting using rigid-motions. 12 Figures 6(c) and 6(d) superimpose the best-fits from this procedure onto the experimental basis-spectra. Optimal predictions for DS theory q ð Þ from Eq. (15) [red line, Figs. 6(c) and 6(d)] appear to capture all aspects of the experimental data DS expt q ð Þ for both basis spectra for state 1 and 2. However, the signal-to-noise ratio for state 2 is better than that of state 1, and this reflects in the R-factor [Eq. (15)] of 33.5% and 20.4% being recovered for states 1 and 2, respectively, when using a weighting of unity over the q-domain 0:15Å q 0:9Å over which the structural fitting was performed. If we remove the micelle-correction terms deriving from Eq. (13) [Fig. 3(f)] and that emerging from variations in the detergent micelle structure [ Fig. 4(b)], then the resulting fits at lower x-ray scattering angle are compromised [red line, Figs. 6(c) and 6(d)] with the R-factors increasing to 52.5% for state 1 and 29.6% for state 2. For comparison, the R-factors reported in Ref. 12 for the intermediate and late states were 32% and 28% respectively, but these were calculated over the smaller q-domain 0:19Å q 0:7Å.

DISCUSSION
Over the last-decade, TR-XSS studies of membrane protein structural changes have become a mature experimental method. 12,18,24,25,28,29 Despite variations in detergent concentrations, variations in photo-excitation levels, difference in signal-to-noise ratios, and other factors associated with sample preparation or the execution of an experiment, the measured difference XSS basis spectra from light-sensitive integral membrane proteins are highly reproducible [ Fig. 2(b)]. A sequence of time-dependent difference-XSS measurements from detergent solubilized membrane proteins [Figs. 2(c) and 2(d)] reveal if secondary structural changes occur and it is usually quite straightforward to extract the rise and decay times for a sequence of structural states. While structural models have been presented from this body of work, the theoretical foundations underpinning the modeling of different XSS data from integral membrane proteins have been somewhat glossed over due to challenges associated with detergent micelles that both fluctuate dramatically and often scatter x-rays more strongly than the protein of interest.
This analysis explicitly addresses this challenge by creating a framework that allows an atomistic description of the protein and detergent micelle using the molecular dynamics package GROMACS. 38,41 As previously, the first step is to identify candidate motions of secondary structural elements. In this analysis, these motions were extracted from recent time-resolved serial crystallography studies of bacteriorhodopsin, 31,32 which makes bacteriorhodopsin a particularly favorable case. It is, however, possible to identify candidate motions from molecular dynamics trajectories [13][14][15]21,23,24,[27][28][29] or potentially building approximate models using experimental results from other biophysical methods. 42 Using selected candidate motions to guide molecular dynamics simulations, we identified how structural fluctuations within the detergent micelle influenced the different x-ray scattering curves [Figs. 3(b)-3(e)] and how changes in the detergent micelle size influenced the difference x-ray scattering (Fig. 4). Since both corrections were similar, they were incorporated as a single micelle correction to the predicted x-ray scattering [DS mc ðqÞ of Eq. (15)]. The most important conceptual advance of the new approach was to interchange the atomic coordinates of the protein micelle when comparing molecular dynamics trajectories for resting and photo-activated structures [ Figs. 1(d)-1(g)]. This allowed a formalism to be established in which the protein-micelle x-ray scattering cross term could be determined without being dominated by structural fluctuations within the micelle [Eq. (13), Fig. 3(f)]. Damping corrections were introduced for the protein-micelle systems [Eq. (15)], solvent contrast corrections were required (Fig. 5), and the x-ray spectrum of the pink-beam undulator had to be explicitly incorporated into the analysis, all of which are consistent with our earlier formalism. 12,18,24,25 Structural conclusions drawn from this approach to molecular fitting of difference x-ray scattering data (Fig. 6) are largely consistent with those previously reported (Fig. 7). The mean Ca displacement helix C was found to be 0.4 6 0.3 Å and 0.7 6 0.3 Å for state 1 and state 2 in this structural analysis, whereas our previous fitting against TR-XSS data 12 yielded the values 0.9 and 1.3 Å for the intermediate and late conformational states, respectively. It seems plausible that the uncertainties in the amplitude of this displacement are large since the Ca coordinate fluctuations throughout the protein within our restrained molecular dynamics simulations are the order of 0.2 Å (Fig. 6). For the movement of the cytoplasmic portions of helix E and F, we recovered a mean Ca displacement of 1.6 6 0.6 Å and 3.0 6 0.6 Å for state 1 and state 2, respectively, whereas we previously concluded that these amplitudes were 2.0 and 2.6 Å for the intermediate and late conformational states of bR. 12 We suggest that agreement within uncertainties in the displacement associated with state 2 and the earlier late conformational state is acceptable.
Time-resolved serial crystallography 4,5 is a rapidly growing field of research and TR-XSS has been established for more than a decade. 11 The theoretical approach presented here underpins earlier assumptions when working with integral membrane proteins 12,18,24,25,28,29 that the presence of detergent micelles does not necessarily affect structural fitting protocols that focus attention upon data in domain q ! 0:2Å À1 . Moreover, this analysis improves the fit to low scattering angle data [Figs. 5(e) and 5(f)] and justifies the assumption that this region is dominated by the influence of the detergent micelle. With the widespread use of rapid x-ray detectors at synchrotron radiation sources 43 and the development of appropriate microfluidic mixing technologies for reaction initiation, it should become possible to extend the field of different XSS studies of integral membrane proteins to incorporate protein and substrate or reagent mixing studies to initiate timeresolved studies. Since caged compounds have also been successfully used for such studies, 28,29 the sub-field of TR-XSS has the potential to grow beyond the study of naturally light-driven systems. Because all future experimental developments require solid theoretical foundations and because the detergent micelle may itself be influenced by mixing, the analysis tools and fitting protocols developed here should provide a framework that allows the field to look to the future with confidence.

MATERIALS AND METHODS Sample preparation
Sample preparation bR was solubilized in b-octylglucoside and concentrated to 1.3 mM, as judged from the bR absorption peak at 570 nm, as described in more detail in earlier studies. 12 TR-XSS data acquisition and processing TR-XSS data were collected at beamline ID09 of the European Synchrotron Radiation Facility (ESRF). Samples bR concentrated to 35 mg/ml were delivered across a polychromatic x-ray beam produced from an x-ray undulator (18 keV, pink beam, DE/E % 4%) through a quartz capillary (0.5 mm diameter, Hampton Research). Samples were driven using a motorized syringe pump (neMESYS) with a flow rate of 3 ll/s. X-ray pulse trains of 1 or 5 ls in duration were isolated from a continuous filling of the storage ring using an x-ray chopper. X-ray scattering images were recorded on a Rayonix MX170-HS detector (1920 Â 1920 pixels, pixels binned 2 Â 2 to a pixel-size of 88.5 lm) located 350 mm from the sample position. X-ray scattering data were collected at room temperature ($20 C). Data were merged over the domain 0.025 Å À1 < q < 3.0 Å À1 .
Samples of bacteriorhodopsin were photo-activated using a 532 nm laser pulse 4 ns in duration with 500 lJ/pulse focused into a spot size of 0.31 Â 1.7 mm 2 (full width half maximum). The triggering and timing of the green laser pulse and the x-ray detector were integrated into the beamline control system. X-ray scattering data were integrated in concentric rings 12 prior to normalization about the isosbestic points at q ¼ 1.6 Å À1 . Outlier rejection was performed in two steps: the ring-integrated absolute scattering curves were rejected if they deviated more than 10% from the median value in the q-range 2.0-2.5 Å À1 , followed by an outlier rejection scheme where difference scattering curves (ON minus OFF) where rejected if they deviated by more than three standard deviations from the mean of a given run. Typically, 4%-10% of all the data were rejected, depending upon the experimental run. The influence of laser-induced heating was measured and removed using data recorded using an IR laser (k ¼ 1470 nm) to heat the sample. At this wavelength heat was deposited, the IR light did not photo-isomerize the retinal chromophore. Heating data were collected and were processed in the same fashion as described for the photo-activated datasets. Heat free light-induced difference scattering data were then computed by scaling and removing the thermal signal from each time-point, where the scaling factor is chosen to minimize the difference x-ray scattering differences over the domain 0:51Å

Molecular dynamics simulations using GROMACS
Monomeric bacteriorhodopsin protein/micelle complexes were built into a b-octylglucoside membrane using CHARMM-GUI Micelle Builder. 44,45 A range of protein/micelle complexes were built, including 170,175,180,185,190,195,200,205,210,215,220,225, and 230 boctylglucoside molecules, with simulations using 190 b-octylglucoside molecules being preferred for most of the analysis. Lipid ring and protein surface penetration was checked by CHARMM as part of the system building options. The system was built within a solvent box consisting of 20 Å in all dimensions, using three-site CHARMM modified TIP3 water model and a concentration of 0.15 M of NaCl. GROMACS input files were created by CHARMM. 46,47 GROMACS 2019.1 37,38 was used to run restrained molecular dynamics simulations of bR using the CHARMM36 force-field. 48 Energy minimization was carried out first using LINCS constraints algorithm with constraints on hydrogen bonds. The second stage was equilibration of the system in six steps with energy and dihedral restraints on the b-octylglucoside molecules. No movement of the protein backbone was allowed during the equilibration and production steps. Molecular dynamics trajectories of 20 ns in duration (preparative runs) or 5 ns (fitting runs) were used to produced 2000 or 500 pdb files, respectively, each pdb file separated by 10 ps. These pdb files were used to calculate the theoretical scattering terms for protein, membrane, and protein plus membrane. Unrestrained molecular dynamics simulations involving only water were also performed using standard settings in GROMACS 2019.1.